[1] -1.959964
DATA1220-55, Fall 2024
2024-10-12
The sampling distribution of an infinite number of sample statistics from a population approximates a normal distribution.
Standard error (SE) is the standard deviation of the sample statistic in a theoretical sampling distribution
If you took an infinite number of samples from a known distribution, the standard error is the standard deviation of the means of those samples
Describes the scale (i.e. variability, sampling error) of the sampling distribution
As \(n\) increases, the standard error \(SE\) decreases.
A Z-score indicates how many standard deviations \(\sigma\) away from the mean \(\mu\) a given observation is.
\[ \begin{aligned} Z&=\frac{\operatorname{observed value}-\operatorname{mean}}{\operatorname{standard deviation}} \\ &= \frac{x-\mu}{\sigma} \end{aligned} \]
What do we mean when we say that estimates are accurate and/or precise?
A confidence interval is a numerical range inside which a statistic is expected to occur with a given probability \(1-\alpha\) (alpha) in any theoretical sample from a given population
Properties of known distributions, like the 68-95-99.7 Rule, are used to calculate the bounds of a confidence interval.
A confidence interval is defined as \(\operatorname{point estimate} \pm \operatorname{margin of error}\)
\(\operatorname{margin of error}=Z^* \times SE\)
\(Z^*=\operatorname{Z-Score}_{\alpha / 2}\)
\(\mathbf{H_0}\): The “Null” Hypothesis
Represents a position of skepticism, nothing is happening here
“There is not an association between process A and B”
\(\mathbf{H_A}\): The “Alternative” Hypothesis
The complement of \(H_0\), something is happening here
“There is an association between process A and B”
Central Limit Theorem for Proportions: sample proportions \(\hat{p}\) will be nearly normally distributed with the mean equal to the population proportion (\(\mu=p\)) and the standard deviation equal to the standard error for a proportion (\(\sigma=\sqrt{\frac{p(1-p)}{n}}\)), such that \(\hat{p} \sim N(\mu=p, \sigma=SE_p)\)).
Assumptions: independence, identically distributed, 10+ successes/failures each
From 1980-2023, 709 tropical cyclones have formed in the Atlantic Ocean. 298 of those tropical cyclones developed into hurricanes, and 72 of those hurricanes made landfall in the continental US.
So far in 2024, 13 tropical cyclones have formed in the Atlantic Ocean. 9 of those tropical cyclones developed into hurricanes, and 2 of those hurricanes made landfall in the continental US.
Research Question: Is a hurricane more likely to hit the continental US in 2024?
From 1980-2023, 709 tropical cyclones have formed in the Atlantic Ocean. 298 of those tropical cyclones developed into hurricanes, and 72 of those hurricanes made landfall in the continental US.
So far in 2024, 13 tropical cyclones have formed in the Atlantic Ocean. 9 of those tropical cyclones developed into hurricanes, and 2 of those hurricanes made landfall in the continental US.
What is the study population?
All hurricanes which formed in the Atlantic Ocean with the potential to make landfall in the continental US, for which we have records.
From 1980-2023, 709 tropical cyclones have formed in the Atlantic Ocean. 298 of those tropical cyclones developed into hurricanes, and 72 of those hurricanes made landfall in the continental US.
So far in 2024, 13 tropical cyclones have formed in the Atlantic Ocean. 9 of those tropical cyclones developed into hurricanes, and 2 of those hurricanes made landfall in the continental US.
What is the sample population?
298 hurricanes which formed in the Atlantic Ocean between 1980-2023 with the potential to make landfall in the continental United States.
From 1980-2023, 709 tropical cyclones have formed in the Atlantic Ocean. 298 of those tropical cyclones developed into hurricanes, and 72 of those hurricanes made landfall in the continental US.
So far in 2024, 13 tropical cyclones have formed in the Atlantic Ocean. 9 of those tropical cyclones developed into hurricanes, and 2 of those hurricanes made landfall in the continental US.
What is the target population?
Future hurricanes which form in the Atlantic Ocean with the potential to make landfall in the continental US.
Is it reasonable to assume that the sample statistics from the data will reliably describe the observed distribution in the sample population?
Is it reasonable to assume that the sample statistics will be a valid estimation of the sampling distribution in the study population?
Is it reasonable to assume that the estimated sampling distribution for the study population will be generalizable to the unobserved distribution in the target population?
Is it reasonable to assume that the population parameters can be modeled using a normal distribution?
If we assume our data is reliable, then our sample statistics will be accurate estimations of the underlying distribution in the sample population.
If we assume our data is valid, then we can use our sample statistics to infer the sampling distribution for the study population.
If we assume our data is generalizeable, then we can use our sampling distribution to test the hypothesis in the target population.
Based on the data from 1980-2023, what is the average probability that a hurricane makes landfall in the continental US?
Step 1: Calculate the sample statistic.
\[ \begin{aligned} \hat{p} &= \frac{72}{298} = 0.242 \end{aligned} \]
Based on the data from 1980-2023, what is the average probability that a hurricane makes landfall in the continental US?
Step 2: Estimate the sampling distribution.
\[ \begin{aligned} SE&=\sqrt{\frac{0.242(1-0.242)}{298}} = 0.025 \end{aligned} \]
The sampling distribution for \(\hat{p}\) approximates the normal distribution \(N(24.2, 2.5)\).
Based on the data from 1980-2023, what is the average probability that a hurricane makes landfall in the continental US?
Step 3: Calculate \(Z^*\) for the confidence threshold \(\alpha=0.05\).
\[ Z^*=Z_{\alpha/2} \]
Based on the data from 1980-2023, what is the average probability that a hurricane makes landfall in the continental US?
Step 4: Construct a 95% confidence interval.
\(\operatorname{point estimate} \pm Z^* \times SE\)
With 95% confidence, the probability of a hurricane making landfall in the continental US is 19.3% to 29.1%.
Based on the data from 1980-2023, what is the average probability that a hurricane makes landfall in the continental US?
Step 5: Assume the null hypothesis.
\(H_0\): The probability of a hurricane making landfall in 2024 is 24.2% (\(\hat{p}=24.2\)%).
\(H_A\) The probability of a hurricane making landfall in 2024 is not 24.2% (\(\hat{p} \ne 24.2\)%).
Based on the data from 1980-2023, what is the average probability that a hurricane makes landfall in the continental US?
Step 6: Calculate the sample statistic.
\[ \begin{aligned} \hat{p} &= \frac{2}{9} \\ &= 0.222 \end{aligned} \]
Based on the data from 1980-2023, what is the average probability that a hurricane makes landfall in the continental US?
Step 7: Calculate the test statistic under \(H_0\).
\[ \begin{aligned} Z&=\frac{\hat{p}-p}{SE} \\ &= \frac{22.2 - 24.2}{2.5} \\ &= -0.8 \end{aligned} \]
Based on the data from 1980-2023, what is the average probability that a hurricane makes landfall in the continental US?
Step 8: Calculate the p-value under \(H_0\).
Based on the data from 1980-2023, what is the average probability that a hurricane makes landfall in the continental US?
Step 9: Reject or fail to reject the null hypothesis.
The p-value for the observed data under the null hypothesis is \(p = 0.423\). As \(p > \alpha\) (\(\alpha = 0.05\)), this is not sufficient evidence of a difference.
We fail to reject the null hypothesis that the probability of a hurricane making landfall in 2024 is 24.2%.
DATA1220-55 Fall 2024, Class 17 | Updated: 2024-10-12 | Canvas | Campuswire